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Integrated processor/memory device with full width cache 



(57) An integrated processor/memory device com- 
prising a main memory, a CPU. and a full width cache. 
The main memory comprises main memory banks. 
Each of the main memory banks stores rows of words. 
The rows are a predetermined number o1 words wide. 
The cache comprises cache banks. Each of the cache 
banks stores one or more cache lines of words. Each of 
the cache lines has a corresponding row in the corre- 
sponding main memory bank. The cache lines are the 



predetermined number of words wide. When the CPU 
issues an address in the address space of the corre- 
sponding main memory bank, the cache bank deter- 
mines from the address and the tags of the cache lines 
whether a cache bank hit or a cache miss has occurred 
in the cache bank. When a cache bank miss occurs, the 
cache bank replaces a victim cache line of the cache 
lines with a new cache line that comprises the corre- 
sponding row of the corresponding memory bank spec- 
ified by the issued address. 
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Description 

The present invention relates generally to integrat- 
ed processor/memory (P/M) devices with an on-chip 
cache and an on-chip main memory. In particular, it per- 
tains to a P/M device with an on-chip cache that is as 
wide as the on-chip main memory (I.e., is full width). 

Traditionally, the development of processor and 
memory devices has proceeded independently. Ad- 
vances in process technology, circuit design, and inte- 
grated chip (IC) architecture have led to a near expo- 
nential increase in processor speed and memory capac- 
ity However, memory device latencies have not im- 
proved as dramatically and access times are increas- 
ingly becoming the limiter of processor performance. 
This isaproblem known as the Memory Wall and is more 
fully described in Hitting the Memon/ Wail: Implication 
of the Obvious , by William A. Wulf and Sally A. McKee, 
ACM Computer Architecture News, Vol. 23, No. 1. 
March 1995, which is hereby explicitly incorporated by 
reference. 

Current high performance processors, which use 
complex superscaler central processing units (CPUs) 
that interface to external off -chip main memory through 
a hierarchy of caches, are particularly affected by the 
Memory Wall problem. In fact, this CPU-centric design 
approach requires a large amount of power and chip ar- 
ea to bridge the gap between CPU and memory speeds. 

The Memory Wall problem is commonly addressed 
by adding several levels of cache to the memory system 
so that small, high speed, static random access memory 
(SRAM) devices feed the CPU at low latencies. Com- 
bined with latency hiding techniques, such as prefetch- 
ing and proper code scheduling, it is possible to run a 
high performance processor at reasonable efficiencies 
for applications with enough locality for the caches. 
However, while achieving impressive performance on 
applications that fit nicely into their caches, these proc- 
essors have become increasingly application sensitive. 
For example, large applications such as CAD programs, 
data base applications, or scientific applications often 
fail to meet CPU based speed expectations by a wide 
margin. 

Moreover, the CPU-centric design approach has 
lead to very complex superscalar processors with deep 
pipelines. Much of this complexity, such as out-of-order 
execution and register scoreboarding, is devoted to hid- 
ing memory system latency. In addition, these proces- 
sors demand a large amount of support logic in terms 
of caches, controllers and data paths to talk to the ex- 
ternal main memory. This adds considerable cost, pow- 
er dissipation, and design complexity. 

To fully utilize a superscalar processor, a large 
memory system is required. The effect of this is to create 
a bottleneck that increases the distance between the 
CPU and main memory. Specifically, it adds interfaces 
and chip boundaries which reduce the available mem- 
ory bandwidth due to packaging and connection con- 



straints. 

However, integrating the processor with the mem- 
ory device avoids most of the problems of the CPU-cen- 
tric design approach. And. doing so offers a number of 
5 advantages that effectively compensate for the techno- 
logical limitatbns of a single chp design. 

Specifically, in CPU-centric processor designs, the 
instruction and data cache lines have a width that is sig- 
nificantly less than the width of the main memory. This 
10 js primarily due to the fact that the time to fill these cache 
lines from the off -chip main memory would introduce se- 
vere second order contention effects at the memory in- 
terface of the processor. As a result, such less than full 
width caches are unabfe to take advantage of the often 
^5 high spatial locality of Instruction and data streams. 

Thus, there is a need for full width instruction and 
data caches that take advantage of the high spatial lo- 
cality of instruction and data streams in many applica- 
tions. Moreover, the Applicant's corresponding Europe- 
20 an Patent Application No. (a copy of 



which is to be found on the file of the present European 
application) entitled "INTEGRATED PROCESSOR/ 
MEMORY DEVICE WITH VICTIM DATA CACHE", filed 
on the same date as the present application, having At- 
2s torney Docket No. P/2984.EP and hereby explicitly in- 
corporated by reference, describes and claims the use 
of a victim data cache to further improve the miss rate 
of such a full width data cache. 

Particular and preferred aspects of the invention are 
30 set out in the accompanying independent and depend- 
ent claims. Features of the dependent claims may be 
combined with those of the independent claims as ap- 
propriate and in combinations other than those explicitly 
set out in the claims. 
35 in summary, the present invention is an integrated 
processor/memory device. It comprises a main memory, 
a CPU, and a full width cache. 

The main memory has a predefined address space 
and comprises main memory banks. Each of the main 
40 memory banks occupies a corresponding portion of the 
address space and stores rows of words at memory k>- 
cations with addresses in the corresponding portion of 
the address space. The rows are a predetermined 
number of words wide. 

The cache comprises cache banks. Each of the 
cache banks is coupled to a corresponding main mem- 
ory bank of the main memory banks and the CPU. Each 
of the cache banks comprises a cache bank line stor- 
age, a cache bank tag storage, and cache bank logic. 
50 The cache bank line storage Is coupled to the corre- 
sponding main memoiy bank and stores one or more 
cache lines of words. Each of the cache lines has a cor- 
responding row in the corresponding main memory 
bank. The cache lines are the predetermined number of 
55 words wide. The cache bank tag storage stores a cor- 
responding tag for each of the cache lines. Each of the 
tags identifies the row in the corresponding memory 
bank of the corresponding cache line. The cache bank 
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logic is coupled to the CPU, the corresponding memory 
bank, and the cache storage. When the CPU issues an 
address in the address space of the corresponding main 
memory bank, the cache bank logic determines from the 
address and the tags of the cache lines whether a cache 
bank hit or a cache miss has occurred in the cache bank 
line storage. When a cache bank miss occurs, the cache 
bank logic replaces a victim cache line of the cache lines 
with a new cache line that comprises the corresponding 
row of the corresponding memory bank specified by the 
issued address. 

Exemplary embodiments of the invention are de- 
scribed hereinafter, by way of example only, with refer- 
ence to the accompanying drawings, in which: 

Figure 1 Is a block diagram of an Integrated proc- 
essor/memory (P/M) device in accordance with the 
present invention. 

Figure 2 is a block diagram of the main memory 
bank, the primary data cache bank, and Instruction 
cache bank of each memory block of the P/M device. 

Figure 3 is a block diagram of the instruction cache 
bank logic of each instruction cache bank. 

Figure 4 Is a state diagram of the states of the In- 
struction cache bank logic of each instruction cache 
bank. 

Figure 5 is a block diagram of the primary data 
cache bank logic of each primary data cache bank. 

Figure 6 is a state diagram of the states of the pri- 
mary data cache bank logic of each primary data cache 
bank. 

Figure 7 is a block diagram of the victim cache of 
the P/M device. 

Figure 8 is a block diagram of the victim cache logic 
of the victim cache. 

Figure 9 is a state diagram of the states of the victim 
cache logic. 

Referring to Figure 1 , there is shown an exemplary 
embodiment of an integrated P/M device 100 in accord- 
ance with the present invention. The integrated compo- 
nents of the P/M device include a CPU 102, an on-chip 
memory system 103, a 64 bit data bus 108, a 25 bit data 
address bus 110. a 32 bit instruction bus 112, a 25 bit 
instruction address bus 114. and a control bus 116. The 
memory system includes 16 memory blocks 104 and a 
victim cache 106. 

Each memory block 104 Includes a corresponding 
main memory bank 118. a corresponding instruction 
cache bank 120. and a corresponding data cache bank 
1 22. As will be evident from the following discussion, the 
1 6 main memory banks together form the main memory 
of the P/M device. /\nd, the 16 Instruction cache banks 
together form a direct-mapped instruction cache while 
the 16 data cache banks together form a two-way set- 
associative data cache. In addition, the victim cache is 
a 16-way fully-associative cache. 



Main Memory 

Referring to Figure 2, the main memory bank 118 
of each memory block 104 comprises a 16M bit DRAM 

5 that has 4096 (4K) rows of menrwDry cells 1 23. Each row 
has 4096 memory cells. The main memory bank also 
includes a row decoder 124 that decodes 12 address 
bits to locate the row addressed (i.e., identified) by the 
12 address bits. And, the main memory bank includes 

10 4096 sense amplifiers 1 26 that collectively read or write 
an addressed row of 4096 bits at a time to or from the 
memory cells of the addressed row. Since in the exem- 
plary embodiment the main memory bank comprises a 
D RAM. access time to the main memory ban k is 6 cycles 

IS (e.g.. 30 ns). 

Since the rows of each main memory bank 118 are 
4096 bits or 512 bytes wide, each main memory bank 
contains 2M bytes and the 16 main memory banks to- 
gether form a nnain memory that contains 32M bytes. 

20 Thus, each main memory bank occupies a 2M byte por- 
tion of the 32M byte main memory address space. More- 
over, each byte is addressable with a 25 bit address 
A24-A0 where the 4 rrK)St significant address bits 
/K24-A21 identify the main memory bank, the next 12 

2S address bits /V20- A9 identify the row of the main mennory 
bank, and the 9 least significant address bits A0-A8 
identify the byte in the row. 

Instruction Cache 

30 

Still referring to Figure 2, the instruction cache bank 
120 of each memory block 104 includes an instruction 
cache bank line storage 1 28. The instruction cache bank 
line storage comprises a single long buffer 130 with 

35 4096 latches. The latches of the buffer collectively store 
a single long instruction cache line (or block) that, like 
each row of the main memory bank 118 of the memory 
block, is 4096 bits or 512 bytes wide. And, since the in- 
struction cache line is as wide as each row of the main . 

40 memory bank, it is considered full-wklth. In the exem- 
plary embodiment, each instruction word is 32 bits or 4 
bytes long. As a result, the instruction cache line is 128 
instruction words wide and so is each row of the main 
memory bank that stores instruction words. 

45 Moreover, in each memory block 104, each row of 
the main memory bank 118 is indexed (i.e., mapped) to 
the single instruction cache line of the instruction cache 
bank line storage 128. Thus, all 25 bit instruction ad- 
dresses A24-A0 that specify a row in the main memory 

so bank will include the same index to the instruction cache 
bank line storage. This index is the 4 most significant 
bits A24-/\21 of these addresses and also identifies the 
main memory bank. 

The instruction cache bank 120 of each memory 

ss block 104 also includes an instruction cache bank tag 
storage 132. The instruction cache bank tag storage 
stores a 12 bit instruction cache line tag that identifies 
the row in the corresponding main memory bank 118 
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normally occupied by the Instruction cache line currently 
stored (I.e.. cached) by the instruction cache bank line 
storage 128. This tag. as will be explained shortly, is 
compared by the instruction cache bank logic 134 with 
the 12 address bits A20-A9 of each 25 bit instruction 
address A24-A0 that is issued and is in the correspond- 
ing main memory bank's portion of the main memory ad- 
dress space. 

The operation of the Instruction cache bank 1 20 of 
each memory block 104 is controlled by the instruction 
cache bank logic 134. Turning now to Figure 3, the in- 
struction cache bank logic of each instruction cache 
bank includes an instruction cache bank control state 
machine 136, an instruction cache bank address/tag 
comparison circuit 138, and an instruction cache bank 
select circuit 1 40. Figure 4 shows the states of operation 
of the instruction cache bank logic control state ma- 
chine. 

Referring to Figures 2-4, when the CPU wishes to 
fetch a new instruction word for the instruction pipeline 
of the CPU, it issues a 25 bit instruction address A24-A0 
on the instruction address bus 114 for fetching the in- 
struction word. The issued instruction address specifies 
the memory location of the instruction word in the ad- 
dress space of the main memory. 

In each instruction cache bank 120, the instruction 
cache bank select circuit 140 of the instruction cache 
bank logic 134 receives the 4 most significant bits 
A24-A21 of the issued instruction address from the in- 
struction address bus 114. In response, it decodes these 
4 address bits to determine whether they identify the 
corresponding main memory bank 118 (i.e.. wether the 
issued address is in the corresponding main memory's 
portion of the main memory address space). If they do 
identify the corresponding main memory bank, then the 
instruction cache bank select circuit sends a bank select 
signal to the instruction cache bank control state ma- 
chine 136 and the instruction cache bank address/tag 
comparison circuit 1 38 indicatlngthat the corresponding 
main memory bank has been selected. Otherwise, the 
bank select signal indicates that the corresponding main 
memory bank has not been selected and the instruction 
cache bank control state machine remains in an idle 
state (state 1 37 of Figure 4). 

In each instruction cache bank 120, when the bank 
select signal indicates that the corresponding main 
memory bank 118 has been selected, then the instruc- 
tion cache bank addressAag comparison circuit 138 
compares the instruction cache line tag currently stored 
in the instructbn cache bank tag storage 1 32 with the 
1 2 address bits A20-A9 of the issued instruction address 
on the instruction address bus 1 1 4. As alluded to earlier, 
these 1 2 address bits identify the memory k>cation of 
the row in the corresponding main memory bank where 
the instruction word is stored. 

If there is a match, then the instruction cache bank 
address/tag comparison circuit 136 issues an instruc- 
tion cache bank hit/miss signal that together with the 



bank select signal indicates that an instruction cache 
bank hit has occurred. This means that the memory lo- 
cation specified by the issued instruction address is cur- 
rently accessible at the instruction cache line currently 
s stored by the instruction cache bank line storage 128. 
The instruction cache bank hit/miss signal and the 
bank select signal from each instruction cache bank 1 20 
are provided to the CPU 102 via the control bus 116. 
When the instruction cache bank hit/nr»lss and bank se- 
10 lect signals from an instruction cache bank Indicate that 
an instruction cache bank hit has occurred in the instruc- 
tion cache bank, this lets the CPU know that the instruc- 
tion word will be fetched directly from the instruction 
cache bank 120. As a result, the CPU does not need to 
15 stall the Instruction pipeline in order to wait for the in- 
struction to be read from the main memory bank into the 
instruction cache bank and then be fetched, as would 
have been the case had an instruction cache bank miss 
occurred. 

^0 In each instruction cache bank 120, the instruction 
cache bank hit/miss signal is also provided to the in- 
struction cache bank control state machine 136. The in- 
struction cache bank control state machine additionally 
receives from the instructkjn address bus 114 the 7 ad- 
25 dress bits A8-A2 of the issued instruction address and 
the instruction cache line currently stored by the instruc- 
tion cache bank line storage 128. 

When the instruction cache bank hit/miss and bank 
select signals from an instruction cache bank 120 indi- 
go cate that an Instruction cache bank hit has occurred in 
the instruction cache bank, the instruction cache bank 
control state machine 1 36 of the instruction cache bank 
leaves its kJle state (state 1 37 of Figure 4) and decodes 
the received 7 address bits to determine the accessible 
3S memory location in the instruction cache line specified 
by the issued instruction address. It then fetches the in- 
struction wc rom this location and provides it to the 
CPU 1 02 (Sv 1 39 of Figure 4). This is done by routing 
(i.e., multiple:<,rig) the instruction word onto the instruc- 
"^0 tion bus 1 1 2 so that It is received by the CPU 1 02. As a 
result, the fetch of the instruction word is completed. In 
the exemplary embodiment, this Is done in a single cycle 
(e.g., 5 ns). 

However, in each instruction cache bank 1 20, when 
45 the instruction cache bank address/tag comparison cir- 
cuit 138 determines that there is no match between the 
compared instruction cache line tag and the 1 2 address 
bits A20-A9 of the issued instruction address, then it is- 
sues an instruction cache bank hit/miss signal that to- 
50 gether with the bank select signal indicates that an in- 
struction cache bank miss has occurred. This means 
that the location specified by the issued instruction ad- 
dress is not currently accessible at the instruction cache 
line currently stored by the instruction cache bank line 
ss storage 128. 

Thus, when the instruction cache bank hit/miss and 
bank select signals received by the CPU 102 from an 
instruction cache bank 120 indicate that an instruction 
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cache bank miss has occurred, it stalls so that a new 
instruction cache line at the memory location specified 
by the issued instruction address can be read by the in- 
struction cache bank control state machine from the cor- 
responding main memory bank 118 into the instruction 
cache bank (state 141 of Figure 4). In this case, when 
the instruction cache bank control state machine 1 36 re- 
ceives this instruction cache bank hit/miss and bank se- 
lect signals, it issues to the main memory bank 11 8 a W/ 
R control signal indicating that a read Is to occur and the 
1 2 address bits A20-A9 received from the instruction ad- 
dress bus 114. In response to the 12 address bits, the 
row decoder locates the row of the main memory bank 
identified by the 12 address bits. And, in response to the 
W/R control signal, the sense amplifiers 126 read out 
this row as the new instruction cache line. While this is 
occurring, the instruction cache bank control state ma- 
chine 1 36 Issues buffer control signals to the buffer 1 30 
of the instruction cache bank line storage 128. In re- 
sponse, the buffer latches the new instruction cache line 
received from the sense amplifiers 126 and in doing so 
replaces the previous instruction cache line that was 
latched by the buffer In the exemplary embodiment this 
requires 6 cycles to perform including 1 cycle to deter- 
mine that an instruction cache bank miss occurred. 4 
cycles of pre-charging the sense amplifiers and address 
bit decoding by the row decoder, and 1 cycle to latch 
into the buffer the new instruction cache line read out by 
the sense amplifiers. 

In each instruction cache bank, once a new instruc- 
tion cache line has been stored in the instruction cache 
bank line storage 1 28, the instruction cache bankcontrol 
state machine 1 36 decodes the 7 address bits A8-A2 of 
the issued instruction address to locate the instruction 
word in the new instruction cache line. It then fetches 
the located instruction word from the instruction cache 
line and routes it to the CPU 102 in the nnanner de- 
scribed earlier (state 139 of Figure 4). As indicated pre- 
viously, in the exemplary embodiment, this Is done in a 
single cycle. After this is accomplished, it returns to an 
idle state (state 137 of Figure 4) and waits for the next 
issued instruction address. 

In view of the foregoing, it is clear that the 16 in- 
struction cache banks 120 together form a direct- 
mapped on-chip instruction cache mennory that contains 
8K bytes. Since the instruction cache line stored by each 
instruction cache bank is full-width, the cache miss rate 
is greatly reduced over conventional processors with in- 
struction cache lines that are less than full-width. This 
low cache miss rate is due to the prefetching effect of 
the long instruction cache line and the usually high spa- 
tial locality found in instruction streams. 

Moreover, conventional processors with off-chip 
main memory and on-chip instruction caches are unable 
to reap the benefit of a full-width instruction cache line. 
This is due to the severe second order contention effects 
that would be introduced at the memory interface in 
reading such a full-width cache line from the main mem- 



ory to the instruction cache. However, in the present in- 
vention, these contention effects are eliminated be- 
cause both the instruction cache and main memory 
banks 118 and 120 are on-chip. Thus, in the exemplary 
5 embodiment, an entire full-width instruction cache line 
can be read in a single cycle from a main memory bank 
into the corresponding instruction cache bank In 6 cy- 
cles. 

10 Data Cache and Victim Data Cache 

Referring again to Figure 2, the primary data cache 
bank 122 of each memory block 104 includes a primary 
data cache bank line storage 144 that comprises two 
IS buffers 146. Like the buffer 130 of each instruction 
cache bank line storage 1 28. each buffer of the primary 
data cache bank line storage includes 4096 latches that 
together store a primary data cache line that is 4096 bits 
or 512 bytes wide. Moreover, in the exemplary embod- 
20 iment. each primary data cache line is 64 data words 
wide with each data word being 64 bits or 8 bytes long. 

In each memory block 104, each row of the main 
memory bank 118 is indexed to both of the primary data 
cache lines of the primary data cache bank line storage 
25 144, as well as being indexed to the instruction cache 
line of the instruction cache bank line storage 1 28. Thus, 
all 25 bit data addresses A24-A0 that specify a row in 
the main memory bank will include the same index to 
the primary data cache bank line storage. Similar to the 
30 instruction addresses, this index is the 4 most significant 
bits A24-A21 of the data addresses and also identifies 
the main memory bank. 

The primary data cache bank 1 22 of each memory 
block 104 also includes a primary data cache bank tag/ 
35 flag storage 1 48. The primary data cache bank tag/flag 
storage stores a corresponding 12 bit primary data 
cache line tag and a corresponding dirty flag for each of 
the two primary data cache lines currently stored by the 
primary data cache bank line storage 144. Each tag 
40 identifies the row In the corresponding main memory 
bank 118 normally occupied by the corresponding pri- 
mary data cache line. These tags are compared by the 
primary data cache bank logic 150 with the 12 address 
bits A20-A9 of each 25 bit data address A24-A0 that is 
45 issued and is in the corresponding main memory bank's 
portion of the main memory address space. Each dirty 
flag identifies whether the corresponding primary data 
cache line is dirty (i.e., contains one or more data words 
that have been written into the primary data cache line 
50 but not yet to the main memory bank). Additionally, the 
primary data cache bank tag/flag storage stores a least 
recently used flag (LRU) flag that identifies which of the 
primary data cache lines was least recently used (i.e., 
accessed). 

55 ' The operation of the primary data cache bank 120 
of each memory block 104 is controlled by the primary 
data cache bank logic 150. As shown in Figure 5. the 
primary data cache bank logic of each primary data 
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cache bank includes a primary data cache bank control 
state machine 1 52. a primary data cache bank address/ 
tag comparison circuit 154, and a primary data cache 
bankselect circuit 156. Figure 6 shows the states o1 op- 
eration of the primary data cache bank logic control state ^ 
machine. 

Referring to Figure 7, and as will be explained in 
greater detail later, the victim data cache is used to store 
victim data cache sub-lines (or sub-blocks) of primary 
data cache lines that were recently replaced (i.e., were i( 
replacement victims) with new primary data cache lines 
in the primary data cache banks 122. The victim data 
cache includes a victim data cache line storage 160 that 
comprises 1 6 buffers 1 62. Each buffer of the victim data 
cache line storage includes 256 latches that together i^ 
store a victim data cache sub-line that is 256 bits or 32 
bytes wide. Thus, in the exemplary embodiment, each 
victim data cache sub-line is 4 data words wide. 

The victim data cache 106 also includes a victim 
data cache tag/flag storage 164. The victim data cache 20 
tag/flag storage stores a corresponding 22 bit tag for 
each of the 16 victim data cache sub-lines currently 
stored by the victim data cache line storage 160, Each 
tag identifies the corresponding victim data cache sub- 
line and indicates the memory location it normally occu- 2S 
pies in the main memory. These tags are compared by 
the victim data cache logic 166 with the 19 address bits 
A24-A6 of each 25 bit data address A24-A0 that is is- 
sued. Additionally, the victim data cache tag/flag storage 
stores a flush flag that identifies which of the victim data 30 
cache sub-lines is to be flushed the next time a new vic- 
tim data cache sub-line is written into the victim data 
cache . 

The operation of the victim data cache 106 Is con- 
trolled by the victim data cache logic 166. As shown in 3S 
Figure 8. the victim data cache logic includes a victim 
data cache control state machine 168 and a victim data 
cache address/tag comparison circuit 170. Figure 9 
shows the states of operation of the victim data cache 
logic control state machine. 40 

Referring to Figure 1 . the CPU issues a 25 bit data 
address A24-A0 on the data address bus 110 when It 
wishes to read or write a data word from or to the main 
memory. The issued data address specifies the memory 
location in the address space of the main memory at 45 
which the data word Is to be read or written. The CPU 
also issues a write/read (W/R) signal on the control bus 
116 that indicates whether a read or write is occurring. 

Turning now to Figures 7-9, each time a data ad- 
dress is issued by the CPU 102, the victim data cache so 
address/tag comparison circuit 170 of the victim data 
cache 1 06 compares the tags currently stored in the vic- 
tim data cache tag/flag storage 1 64 with the 1 9 address 
bits A24-A6 of the issued data address on the data ad- 
dress bus 110. If there is a match, then the victim data ss 
cache address/tag comparison circuit issues a victim 
data cache hit/miss signal that indicates that a victim 
data cache hit has occurred. This means that the mem- 



ory location addressed by the issued data address is 
currently accessible at one of the victim data cache sub- 
lines stored in the victim data cache line storage 160. 
The victim data cache hit/naiss signal also identifies the 
victim data cache sub-line in which the victim data cache 
hit occurred. But, if there is no match, then this means 
that the memory location addressed by the issued data 
address is not currently accessible at one of the victim 
data cache sub-lines stored in the victim data cache line 
storage and the victim data cache address/tag compar- 
ison circuit issues a victim data cache hit/miss signal 
that indicates that a victim data cache miss has oc- 
curred. 

Unlike conventional victim data caches, the victim 
data cache 106 in the exemplary embodiment is not 
used to write back victim data cache sub-lines to the pri- 
mary data cache banks 122. In other words, the victim 
data cache cannot write a data word into a victim data 
cache sub-line and then write back the dirty victim data 
cache sub-line to the corresponding primary data cache 
bank. This, is due to the timing and architectural con- 
straints discussed later. 

The victim data cache control state machine 1 68 re- 
ceives the W/R signal from the CPU 102 on the control 
bus. the victim data cache hit/miss signal from the victim 
data cache address/tag comparison circuit 170, the 6 
address bits A8-A3 of the Issued data address on the 
data address bus 110, and the victim data cache sub- 
lines currently stored by the victim data cache line stor- 
age 160. When the W/R signal indicates that a read is 
occurring and the victim data cache hit/miss signal indi- 
cates that a victim data cache hit has occurred, then the 
victim data cache control state machine leaves its idle 
state (state 171 of Figure 9) and decodes the received 
6 address bits to determine the accessible memory lo- 
cation of the data word in the identified victim data cache 
sub-line at which the data word is to be read. The victim 
data cache control state machine then reads the data 
word from the identified victim data cache sub-line and 
provides it to the CPU 1 02 (state 1 73 of Figure 9). This 
is done by routing the data word in the identified victim 
data cache sub-line onto the data bus 108 so that It is 
received by the CPU. In the exemplary embodiment, on- 
ly a single cycle is required to access the victim data 
cache and read a victim data cache sub-line to the CPU. 

However, when the W/R signal received from the 
CPU 102 indicates that a write is occurring or when a 
victim data cache hit/miss signal is issued indfcaling that 
a victim data cache miss has occurred, then the victim 
data cache control state machine 168 remains in an idle 
state (state 171 of Figure 9). In this case, the data word 
that is to be written or read at the memory location spec- 
ified by the issued data address must be written to or 
read from the primary data cache bank 1 22 In the mem- 
ory block 104 with the corresponding main memory 
bank 118 that has the memory location specified by the 
issued data address. 

The CPU 102 also receives the victim data cache 
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hit/miss signal. Thus, when the CPU receives a victim 
data cache hil/miss signal indicating that a victirn data 
cache hit has occurred during a read, it waits for the data 
word at the memory location specified by the issued da- 
ta address to be provided to it by the victim data cache 
106 via the data bus 108. However when the CPU re- 
ceives a victim data cache hit/miss signal that indicates 
that a victim data cache hit has occurred during a write 
or receives a victim data cache hit signal indicating that 
a victim data cache miss has occurred, it determines 
whether a primary data cache bank hit or miss signal 
has been issued by the primary data cache bank 1 22 
corresponding to the main memory bank 118 with the 
memory location specified by the issued data address . 

Referring to Figures 2, 5, and 6, in each primary da- 
ta cache bank 122, the primary data cache bank select 
circuit 156 of the primary data cache bank logic 1 50 op- 
erates in the same way as the instruction cache bank 
select circuit 140 ot the instruction cache bank logic 1 34 
ol each instruction cache bank 1 20. Thus, it receives the 
4 incbl significant bits A24-A21 of the issued data ad- 
dress from the data address bus 110. In response, it de- 
codes those 4 address bits to determine whether they 
identity the corresponding main memory bank 118. If 
they do identify the corresponding main memory bank, 
then the primary data cache bank select circuit sends a 
bank select signal to the primary data cache bank con- 
trol Slate machine 1 52 and the primary data cache bank 
address/tag comparison circuit 154 indicating that the 
corresponding main memory bank has been selected. 
Otherwise, the bank select signal indicates that the cor- 
responding main memory bank has not been selected 
and the primary data cache bank control state machine 
remains in an idle state (state 1 57 of Figure 6). 

In each primary data cache bank 122, when the 
bank select signal indicates that the corresponding main 
memory bank 118 has been selected, then the primary 
data cache bank address/tag comparison circuit 154 
compares the primary data cache line tags currently 
stored in the primary data cache bank tag/flag storage 
1 48 with the 12 address bits A20-A9 of the issued data 
address on the data address bus 1 1 0. These 12 address 
bits identify the memory locatbn of the row in the corre- 
sponding main memory bank where the data word is 
currently stored for a read or is to be stored for a write. 

If there is a match, then the primary data cache bank 
address/tag comparison circuit 1 54 issues a primary da- 
la cache bank hit/miss signal that together with the bank 
select signal indicates that a primary data cache bank 
hit has occurred. This means that the memory location 
addressed by the issued data address is currently ac- 
cessible at one of the primary data cache lines stored 
in the primary data cache bank line storage 144. The 
primary data cache bank hit/miss signal also identifies 
this primary data cache line. On the other hand, if there 
is no match, then this means that the memory location 
addressed by the issued data address is not currently 
accessible at one of the primary data cache lines stored 



in the primary data cache bank line storage 1 44 and the 
primary data cache bank address/tag comparison circuit 
issues a primary data cache bank hit/miss signal that 
together with the bank select signal indicates that a pri- 

5 mary data cache bank miss has occurred. 

In each primary data cache bank 122, the primary 
data cache bank hil/miss signal is provided to the pri- 
mary data cache bank control state machine 152. The 
victim data cache hit/miss signal from the victim data 

10 cache 106 is also provided to the primary data cache 
bank control stale machine on the control bus 116 along 
with the W/R signal from the CPU. 

As indicated earlier, when a victim data cache hit/ 
miss signal indicating a victim data cache hit is issued 

IS during a read, then the victim data cache 106 provides 
the CPU with the data word at the memory location ad- 
dressed by the issued data address. Thus, in each pri- 
mary data cache bank 122, when the primary data 
cache bank control state machine 152 receives a victim 

20 data cache hit/miss signal indicating a victim data cache 
hit and a W/R signal indicating a read, then it remains 
in an idle state (state 157 of Figure 6). This is true even 
when the primary data cache bank hit/miss signal it re- 
ceives from the primary data cache bank addressAag 

25 comparison circuit 154 indicates that a primary data 
cache bank hit has occurred. 

However, when a victim data cache hit/miss signal 
indicating a victim data cache hit is issued during a write 
or when a victim data cache hit/miss signal indicating a 

30 victim data cache miss is issued, then the victim data 
cache 106 is not used to access the location addressed 
by the issued data address. Thus, in each primary data 
cache bank 122, in either of the two conditions just de- 
scribed, the primary data cache bank control state ma- 

35 chine 1 52 controls the reading and writing of a data word 
at the memory location specified by the issued data ad- 
dress in either case where the primary data cache bank 
hit/miss and bank select signals Indicate a primary data 
cache bank hit or miss has occurred. 

40 The primary data cache bank hit/miss and bank se- 
lect signals from each primary data cache bank 1 22 are 
also provided to the CPU 102 via the control bus 116. 
When these signals from a primary data cache bank in- 
dicate that a primary data cache bank hit has occurred 

45 and either a victim data cache hit/miss signal that indi- 
cates that a victim data cache hit has occurred is re- 
ceived during a write or a vfctim data cache hit/miss sig- 
nal indicating that a victim data cache miss has occurred 
is received, the CPU knows that the data word to be read 

so or written can be done so directly from or to the primary 
data cache bank 122. The CPU then does not stall the 
instruction pipeline in order to wait for a primary data 
cache line with an accessible memory location specified 
by the issued data address is read from the main mem- 

55 ory bank into the primary data cache bank, as would 
have been the case had a primary data cache bank miss 
occurred. 

In each primary data cache bank 122. in addition to 
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the primary data cache bank hit/miss, bank select, and 
W/R signals, the primary data cache bank control state 
machine 152 receives from the data address bus 110 
the 6 address bits A6-A3 of the issued data address and 
the primary data cache lines currently stored by the pri- 
mary data cache bank line storage 144, When the pri- 
mary data cache bank hit/miss and bank select signals 
indicate that a primary data cache bank hit has occurred, 
the primary data cache bank control state machine 152 
decodes the received 6 address bits to determine the 
accessible memory location specified by the issued data 
address in the primary data cache line Identified by the 
primary data cache bank hit/miss signal as being the pri- 
mary data cache line in which the primary data cache 
bank hit occurred. If the W/R signal indicates a read; 
then the primary data cache bank control state machine 
reads the data word from the determined location in the 
identified data cache line and provides it to the CPU 1 02 
(state 159 of Figure 6). This is done by routing the data 
word in the identified data cache line onto the data bus 
108 so that it is received by the CPU. In the exemplary 
embodiment, only a single cycle is required to access 
the primary data cache bank and read the data word to 
the CPU. Once the read is completed, the primary data 
cache bank control state machine returns to an idle state 
(state 157 of Figure 6). 

But, if the W/R signal indicates a write, the primary 
data cache bank control state machine 1 52 writes a data 
word from the CPU to the determined location in the 
identified data cache line. This is done by routing the 
data word from the data bus 108 to the buffer 1 46 in the 
primary data cache bank line storage 1 44 that stores the 
identified primary data cache line and issuing buffer con- 
trol signals that cause the buffer to latch the data word 
(state 159 of Figure 6). Then, if the corresponding dirty 
flag for the identified primary data cache line does not 
already indicate that the primary data cache line is dirty, 
then the primary data cache bank control state machine 
updates It to indicate that it is now dirty (sub-state 161 
of Figure 6). This is done by providing the updated dirty 
flag to the primary data cache bank tag/flag storage 1 48 
and issuing storage control signals that cause the pri- 
mary data cache bank tag/flag storage to store the up- 
dated dirty flag. Once the write is completed, the primary 
data cache bank control stale machine returns to an kdle 
state (state 157 of Figure 6). 

However, when the CPU receives primary data 
cache bank hit/miss and bank select signals from a pri- 
mary data cache bank 1 22 that indicate that a primary 
data cache bank miss has occurred and either receives 
during a write a victim data cache hit/miss signal that 
Indicates that a victim data cache hit has occurred or 
receives a victim data cache hit signal indicating that a 
victim data cache miss has occurred, it stalls while a new 
primary data cache line with the memory location spec- 
ified by the issued data address is read from the corre- 
sponding main mennory bank 118 into the primary data 
cache bank. This also requires writing to the main mem- 



ory bank the victim primary data cache line being re- 
placed by the new primary data cache line if the corre- 
sponding dirty flag for the victim primary data cache line 
indicates that it is dirty. In this case: the CPU will be ad- 
s ditlonally stalled. 

In each primary data cache bank 122. the primary 
data cache bank control state machine 152 also re- 
ceives from the primary data cache bank tag/flag stor- 
age 148 the dirty flags for the primary data cache lines 
10 stored by the primary data cache bank line storage 144 
in order to determine whether they are dirty. /\s de- 
scribed earlier, each dirty flag is updated to indicate that 
the corresponding primary data cache line is dirty when- 
ever a data word is written to the corresponding prinr^ry 
fs data cache line and the dirty flag does not yet indicate 
that the primary data cache line is dirty. 

The primary data cache bank control state machine 
1 52 of each primary data cache bank 1 22 also receives 
the LRU flag from the primary data cache bank tag/flag 
20 storage 148 of the primary data cache bank. As men- 
tioned previously, the LRU flag identifies the primary da- 
ta cache line that was least recently used. The LRU flag 
is updated by the primary data cache bank control state 
machine each time that a different primary data cache 
2S line is accessed for a read or a write. The updated LRU 
flag is then provided to the primary data cache bank tag/ 
flag storage and stored in it with storage control signals 
issued by the primary data cache bankcontrol state ma- 
chine. 

30 In each prinr«ry data cache bank 1 22. when the pri- 
mary data cache bank control state machine 1 52 of the 
primary data cache bank receives primary data cache 
bank hit/miss and bank select signals indicating that a 
primary data cache bank miss has occurred and either 

35 receives a victim data cache hit/miss signal indk:ating 
that a victim data cache hit has occurred and a W/R sig- 
nal indicating that a write is occurring or receives a victim 
data cache hit signal indicating that a victim data cache 
miss has occurred, then this means that a new primary, 
data cache line with the memory location specified by 
the issued data address must be read from the corre- 
sponding main memory bank 118. However, prior to do- 
ing so, the primary data cache bank control state ma- 
chine determines from the LRU flag which of the cur- 

45 rently stored primary data cache lines is the least recent- 
ly used one and therefore will be the victim primary data 
cache line that will be replaced by the new primary data 
cache line. 

However, in each primary data cache bank 122, pri- 
50 or to replacing the victim data cache line with a new data 
cache line, the primary data cache bank control state 
machine 152 writes back to the corresponding main 
memory bank 118 the victim primary data cache line if 
it is dirty (state 1 63 of Figure 6). The primary data cache 
55 bank control state machine does so by first determining 
from the corresponding dirty flag provided by the prima- 
ry data cache bank fag/flag storage 1 48 whether the vic- 
tim primary data cache line is dirty. If it is dirty, then the 
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primary data cache bank control state machine issues 
on the control bus 116 a dirty cache line write signal in- 
dicating that it needs to write back a dirty victim primary 
data cache line. This signal is received by the CPU 102 
and in response the CPU stalls to allow the dirty victim s 
primary data cache line to be written back to the corre- 
sponding main memory bank and the new primary data 
cache line to be read into the primary data cache bank. 

The primary data cache bank control state machine 
1 52 writes back the dirty victim primary data cache line io 
by issuing to the corresponding main memory bank 118 
a W/R control signal indicating that a write is to occur 
and the 12 address bits A20-A9 received from the cor- 
responding tag for the dirty victim primary data cache 
line provided by the primary data cache bank tag/flag 
storage 148. Moreover, the primary data cache bank 
control state machine issues buffer control signals to the 
buffer 1 46 that stores the dirty victim primary data cache 
line being written back so that the dirty victim primary 
data cache line is routed to the sense amplifiers 126 of 
the corresponding main memory bank. In response to 
the 12 address bits, the row decoder 124 locates the 
row of the corresponding main memory bank that is 
identified by the 1 2 address bits. And, in response to the 
W/R control signal, the sense amplifiers 126 write the 
provided dirty victim primary data cache line into the 
identified row of the corresponding main memory bank. 
In the exemplary embodiment, 6 cycles are required to 
write back a dirty victim primary data cache line includ- 
ing 1 cycle to determine that a primary data cache bank 
miss occurred and to identify a dirty victim primary data 
cache line, 4 cycles of pre-charging the sense amplifiers 
and address bit decoding by the row decoder, and 1 cy- 
cle to write the dirty victim primary data cache line into 
the main memory bank. 

Moreover, since the CPU is stalled while a dirty vic- 
tim primary data cache line is being written back, the 
other primary data cache banks 122 each write back a 
dirty primary data cache line to the corresponding main 
memory bank 118 if it stores at least one dirty primary 
data cache line (state 163 of Figure 6). In each of these 
other primary data cache banks, this is done when the 
dirty cache line write signal on the control bus 116 indi- 
cates that a dirty victim primary data cache line is being 
written back and the bank select signal indicates that 
the corresponding main memory bank has not been se- 
lected. This write back is controlled by the primary data 
cache bank control state machine 152 of each of these 
other primary data cache banks in a similar manner to 
that just described. However, if there is only one dirty 
primary data cache line stored by a primary data cache 
bank, then it is written back. But, if there are two dirty 
primary data cache lines, then the dirty primary data 
cache line identified by the LRU flag as being the LRU 
primary data cache line is the one that is written back. 
Once this write back is completed, then the primary data 
cache bank control state machine returns to an idle state 
(state 157 of Figure 6). 



In each primary data cache bank 122, once a dirty 
victim primary data cache line has been written back to 
the corresponding main memory bank 118 or when the 
corresponding dirty flag for the victim primary data 
cache line indicates that it is not dirty, then the primary 
data cache bank control state machine 152 reads the 
new primary data cache line with the location specified 
by the issued data address from the corresponding main 
memory bank 118 into the primary data cache bank 
(state 163 of Figure 6). This is done by issuing to the 
corresponding main memory bank 118 a W/R control 
signal indicating that a write is to occur and the 12 ad- 
dress bits A20- A9 of the issued data address on the data 
address bus 1 1 0. In response to the 1 2 address bits, the 
row decoder 124 locates the row of the corresponding 
main memory bank that is identified by the 12 address 
bits. And, in response to the W/R control signal, the 
sense amplifiers 126 read out the new primary data 
cache line from the identified row of the corresponding 
main memory bank. Moreover, the primary data cache 
bank control state machine issues buffer control signals 
to the buffer 146 that stores the victim primary data 
cache line being replaced so that the new primary data 
cache line is latched by the buffer and replaces the vic- 
tim primary data cache line. In the exemplary embodi- 
ment, this requires 5 cycles including 4 cycles of pre- 
charging the sense amplifiers and address bit decoding 
by the row decoder and 1 cycle to latch the new primary 
data cache line read out by the sense amplifiers into the 
main memory bank. 

But. since accessing the main memory bank 118 to 
read out the new primary data cache line requires time 
for address bit decoding by the row decoder 1 24 and 
pre-charging of the sense amplifiers 126. this time can 
be efficiently used to write the most recently used (MRU) 
primary data cache sub-line of the victim primary data 
cache line to the victim data cache 1 06 prior to the new 
primary data cache line being latched in the buffer 1 46. 
In order to determine which primary data cache sub-line 
in a victim primary data cache line is the MRU sub-line, 
the primary data cache bank tag/flag storage 148 of 
each primary data cache bank 122 stores an MRU flag 
that identifies the MRU victim data cache sub-line in 
each primary data cache line stored by the primary data 
cache bank line storage 144. In addition, since the data 
bus 108 is not being used during this time, it can be ef- 
ficiently used to write the MRU victim data cache sub- 
line to the victim data cache. 

Therefore, during the time the sense amplifiers are 
being pre-charged and the row decoder is decoding ad- 
dress bits, the primary data cache bank control state 
machine 162 identifies the MRU victim data cache sub- 
line from the corresponding MRU flag received from the 
primary data cache bank tag/flag storage 148. It then 
routes the MRU victim data cache sub-line to the victim 
data cache 106 (sub-state 165 of Figure 6) using the 
data bus 1 08. In the exemplary embodiment, this is done 
in four cycles since the data bus is 64 bits wide or 1 data 
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word wide and the MRU victim data cache sub-line is 
256 bits or 4 data words wide. Thus, the primary data 
cache bank control state machine routes a block of 64 
bits or 1 data word of the MRU victim data cache sub- 
line each cycle onto the data bus during this time period, s 

Referring again to Figures 7-9. the victim data 
cache control state machine 168 receives from the con- 
trol bus 116 the primary data cache bank hit/miss and 
bank select signals from each primary data cache bank 
1 22 and also the dirty cache line write signal. When the io 
primary data cache bank hit/miss and bank select sig- 
nals from a primary data cache bank indicate that a pri- 
mary data cache bank miss has occurred and the dirty 
cache line write signal indicates that a dirty victim pri- 
mary data cache line is not being written back, then the is 
victim data cache control state machine writes the MRU 
victim data cache sub-line provided by the primary data 
cache bank on the data bus 108 into victim data cache 
line storage 160 (state 175 of Figure 9). 

The victim data cache control state machine 168 20 
does this by first determining which of the victim data 
cache sub-lines is to be replaced by the MRU victim data 
cache sub-line. In the case where there is a victim data 
cache hit during a read, the victim data cache control 
state machine 168 replaces the victim data cache sub- 2S 
line in which the hit occurred with the MRU victim data 
cache sub-line. This is done because a primary data 
cache bank miss occurred in the primary data cache 
bank 122 that provides the MRU victim data cache sub- 
line and the primary data cache line that is being read 30 
in response from the corresponding main memory bank 
into the primary data cache bank includes the victim da- 
ta cache sub-line being replaced. This is indicated by 
the fact that a victim data cache hit occurred in the victim 
data cache sub-line being replaced. 3S 

However, in the case where there was a victim data 
cache miss, the victim data cache control state machine 
168 replaces the LRU victim data cache sub-line with 
the MRU victim data cache sub-line. The LRU victim da- 
ta cache sub-line is identified by the LRU flag stored by 40 
the victim data cache tag/flag storage 1 64. The LRU flag 
is updated by the vrctim data cache control state ma- 
chine each time that a victim data cache sub-line is ac- 
cessed tor a read. The updated LRU flag is then provid- 
ed to the victim data cache tag/flag storage and stored 45 
in it with storage control signals issued by the victim data 
cache control state machine. 

The victim data cache control state machine 168 
stores the MRU victim data cache sub-line in the buffer 
1 62 that currently stores the victim data cache sub-line so 
being replaced. This is done by routing to the corre- 
sponding latches of the buffer the 64 bit blocks received 
on the data bus during the 4 cycles required to transfer 
the MRU victim data cache sub-line. At the same time, 
buffer control signals are issued to the corresponding ss 
latches during the 4 cycles so as to latch the 64 bit blocks 
in the buffer. 

Tuming again to Figures 2, 5, and 6. in each primary 



data cache bank 1 22, after a new primary data cache 
line has been read into the primary data cache bank and 
an MRU victim data cache sub-line has been read into 
the victim data cache 106, then a data word is read from 
or written to the new primary data cache line as de- 
scribed earlier (states 159 and 161 of Figure 6). 

Thus, from the foregoing, the 1 6 primary data cache 
banks 1 22 together form a two-way set-associative data 
cache that contains 1 6K bytes and the victim data cache 
106 is a 16-way fully-associative victim data cache. 
Moreover, collectively, they form the data cache system 
of the P/M devrce 100. Since the primary data cache 
lines stored by each data cache bank are full-width and 
on-chip, the cache miss rate is greatly reduced over con- 
ventional data caches that store data cache lines that 
are less than full-width. As in the instruction cache 
fomned by the Instruction cache banks 120, this low 
cache miss rate is due to the benefit of prefetching the 
long data cache lines for accesses with high spatial lo- 
cality. Moreover, this miss rate is even further reduced 
by the utilization of the on-chip victim data cache which 
absorbs accesses with poor spatial locality. Additionally, 
because of severe second order contention effects of 
the kind described earlier for off-chip main memory and 
on-chip instruction caches, conventional processors 
with off -chip main memory and on-chip data caches are 
unable to take advantage of the benefit of a full-width 
data cache line. 

As those skilled in the art will recognize numerous 
alternative embodiments to the exemplary embodiment 
of Figures 1 -9 exist. For example, the rows in the main 
memory banks 118 and the buffers 130 and 146 in the 
data and instruction cache bank line storages may have 
a different width than the exemplary width of 4096 bits, 
but would still preferably have equal size widths. And, 
each instruction cache bank and each data cache bank 
could include one or more buffers. Furthermore, a victim 
data cache, like the victim data cache used for the pri- 
mary data cache banks, could be used for the instruction 
cache banks. Finally, rather than using the LRU policy 
for determining victim data cache lines and the MRU pol- 
tey for determinrig a victim data cache sub-line to be 
written to the victim data cache, other policies could be 
used Instead. 

While the present invention has been described 
with reference to a few specific embodiments, the de- 
scription is illustrative of the invention and is not to be 
construed as limiting the invention. Various modiiica- 
tions may occur to those skilled in the art without depart- 
ing from the scope of the Invention. 



Claims 

1. An integrated processor/memory device that com- 
prises: 

a main memory that has a predefined address 
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space and comprises main memory banks, 
each of the main memory banks occupying a 
corresponding portion of the address space 
and storing rows of words at memory locations 
with addresses in the corresponding portion of s 
the address space, the rows being a predeter- 
mined number of words wide; 
a central processing unit (CPU) that is coupled 
to each of the memory banks; 
a cache comprising cache banks, each of the fO 
cache banks being coupled to a corresponding 
main memory bank of the main memory banks 
and the CPU. each of the cache banks com- 
prising: 

IS 

a cache bank line storage that is coupled 
to the corresponding main memory bank, 
the cache line storage storing one or more 
cache lines of words, each of the cache 
lines having a corresponding row in the 20 
corresponding main memory bank, the 
cache lines being the predetermined 
number of words wide; 
a cache bank tag storage that stores a cor- 
responding tag for each of the cache lines, 2S 
each of the tags identifying the row in the 
corresponding memory bank of the corre- 
sponding cache line; 

cache bank logic that is coupled to the 
CPU, the corresponding memory bank, 30 
and the cache storage, the cache bank log- 
ic, when the CPU issues an address in the 
address space of the corresponding main 
memory bank, determining from the ad- 
dress and the tags of the cache lines 35 
whether a cache bank hit or a cache miss 
has occurred in the cache bank line stor- 
age, the cache bank logic, when a cache 
bank miss occurs, replacing a victim cache 
line of the cache lines with a new cache line 40 
that comprises the corresponding row of 
the corresponding memory bank specified 
by the issued address. 

The integrated processor/memory device of claim ^5 
1 wherein the cache bank line storage comprises, 
for each of the cache lines, a corresponding buffer 
that stores the corresponding cache line. 

The integrated processor/memory device of claim 50 
1 wherein: 

the cache is a data cache and the words are 

data words; 

in each cache bank: 

the cache bank tag storage also stores a 
corresponding dirty flag for each of the 



cache lines, each dirty flag identifying 
whether the corresponding cache line is 
dirty; 

the cache bank logic, when a cache bank 
miss occurs in the cache bank storage and 
the dirty flag of the victim cache line indi- 
cates that the victim cache line is dirty, writ- 
ing the victim cache line to the row of the 
corresponding memory bank specified by 
the tag of the victim cache line prior to re- 
placing the victim cache line. 

4. A cache for use with a central processing unit and 
a main memory, the main memory having a prede- 
fined address space and comprising main memory 
banks, each of the main memory banks occupying 
a corresponding portbn of the address space and 
storing rows of words at memory k)cations with ad- 
dresses in the corresponding portion of the address 
space, the rows being a predetermined number of 
words wide, the cache comprising: 

cache banks, each of the cache banks corre- 
sponding to a main memory bank of the main mem- 
ory banks, each of the cache banks comprising: 

a cache bank line storage that stores one or 
nrvDre cache lines of words, each of the cache 
lines having a corresponding row in the corre- 
sponding main memory bank, the cache lines 
being the predetermined number of words 
wide; 

a cache bank tag storage that stores a corre- 
sponding tag for each of the cache lines, each 
of the tags identifying the row in the corre- 
sponding memory bank of the corresponding 

cache line; 

cache bank logic that, when the CPU issues an 
address in the address space of the corre- 
sponding main memory bank, determines from 
the address and the tags of the cache lines 
whether a cache bank hit or a cache miss has 
occurred in the cache bank storage, the cache 
bank logic, when a cache bank miss occurs, re- 
placing a victim cache line of the cache lines 
with a new cache line that comprises the corre- 
sponding row of the corresponding memory 
bank specified by the issued address. 

5. The cache of claim 4 wherein the cache bank line 
storage comprises, for each of the cache lines, a 
corresponding buffer that stores the corresponding 
cache line. 

6. The cache of claim 4 wherein: 

the cache is a data cache and the words are 

data words; 

in each cache bank: 
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the cache bank tag storage also stores a 10. The method of claim 7 wherein: 

corresponding dirty flag for each of the 

cache lines, each dirty flag identifying the cache Is a data cache and the words are 

whether the corresponding cache line is data words: 

^'^y- ^ the method further comprises the steps of in 

the cache bank togb, when a cache bank each cache bank: 

miss occurs in the cache storage and the 
dirty flag of the victim cache line indicates 
that the victim cache line is dirty, writing the 
victim cache line to the row of the corre- 
sponding memory bank specified by the 
tag of the victim cache line prior to replac- 
ing the victim cache line. 

7. A method of providing a cache for use with a central 
processing unit and a main memory, the main mem- 
ory having a predefined address space and com- 
prising main memory banks, each of the main nnem- 
ory banks occupying a corresponding portion of the 
address space and storing rows of words at memory 
locations with addresses in the corresponding por- 
tion of the address space, the rows being a prede- 
termined number of words wide, the method com- 
prising the steps of: 

providing cache banks, each of the cache 
banks corresponding to a main memory bank 
of the main memory banks; 
in each of the cache banks: 

storing one or more cache lines of words, 
each of the cache lines having a corre- 
sponding row in the corresponding main 
memory bank, the cache lines being the 
predetermined number of words wide; 
storing a corresponding tag for each of the 
cache lines, each of the tags identifying the 
row in the corresponding memory bank of 
the corresponding cache line; 
when the CPU issues an address in the ad- 40 
dress space of the corresponding main 
memory bank, determining from the ad- 
dress and the tags of the cache lines 
whether a cache bank hit or a cache miss 
has occurred in the cache bank; and 45 
when a cache miss occurs, replacing a vic- 
tim cache line of the cache lines with a new 
cache line that comprises the correspond- 
ing row of the corresponding memory bank 
specified by the issued address. so 

8. The method of claim 7 wherein the cache line stor- 
age step comprises, for each of the cache lines, 
storing the cache line in a corresponding buffer 

55 

9. The method of claim 7 wherein the cache is an in- 
struction cache and the words are instruction 
words. 



storing a corresponding dirty flag for each 
of the cache lines, each dirty flag identify- 
ing whether the corresponding cache line 
is dirty; 

when a cache miss occurs in the cache 
storage and the dirty flag of the victim 
cache line indicates that the victim cache 
line is dirty, writing the victim cache line to 
the row of the corresponding memory bank 
specified by the tag of the vrctim cache line 
prior to replacing the victim cache line. 
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